Resolving the Family Tree of Placental Mammals
نویسنده
چکیده
Background: Genetic analysis of Escherichia coli O157:H7 strains has shown divergence into two distinct lineages, lineages I and II, that appear to have distinct ecological characteristics, with lineage I strains more commonly associated with human disease. In this study, microarray-based comparative genomic hybridization (CGH) was used to identify genomic differences among 31 E. coli O157:H7 strains that belong to various phage types (PTs) and different lineage-specific polymorphism assay (LSPA) types. Results: A total of 4,084 out of 6,057 ORFs were detected in all E. coli O157:H7 strains and 1,751 were variably present or absent. Based on this data, E. coli O157:H7 strains were divided into three distinct clusters, which consisted of 15 lineage I (LSPA type 111111), four lineage I/II (designated in this study) (LSPA type 211111) and 12 lineage II strains (LSPA 222222, 222211, 222212, and 222221), respectively. Eleven different genomic regions that were dominant in lineage I strains (present in ≥80% of lineage I and absent from ≥ 92% of lineage II strains) spanned segments containing as few as two and up to 25 ORFs each. These regions were identified within E. coli Sakai S-loops # 14, 16, 69, 72, 78, 83, 85, 153 and 286, Sakai phage 10 (S-loops # 91, 92 and 93) and a genomic backbone region. All four lineage I/II strains were of PT 2 and possessed eight of these 11 lineage I-dominant loci. Several differences in virulence-associated loci were noted between lineage I and lineage II strains, including divergence within S-loop 69, which encodes Shiga toxin 2, and absence of the non-LEE encoded effector genes nleF and nleH1-2 and the perC homologue gene pchD in lineage II strains. Conclusion: CGH data suggest the existence of two dominant lineages as well as LSPA type and PT-related subgroups within E. coli O157:H7. The genomic composition of these subgroups supports the phylogeny that has been inferred from other methods and further suggests that genomic divergence from an ancestral form and lateral gene transfer have contributed to their evolution. The genomic features identified in this study may contribute to apparent differences in the epidemiology and ecology of strains of different E. coli O157:H7 lineages. Published: 16 May 2007 BMC Genomics 2007, 8:121 doi:10.1186/1471-2164-8-121 Received: 21 December 2006 Accepted: 16 May 2007 This article is available from: http://www.biomedcentral.com/1471-2164/8/121 © 2007 Zhang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 16 (page number not for citation purposes) BMC Genomics 2007, 8:121 http://www.biomedcentral.com/1471-2164/8/121 Background Enterohemorrhagic E. coli (EHEC) are associated with gastrointestinal and systemic illness in humans. This illness can range in severity from uncomplicated diarrhea to hemorrhagic colitis and the sometimes fatal hemolytic uremic syndrome [1-3]. EHEC strains possess a number of common virulence traits, such as the production of one or more types of antigenically distinct Shiga toxins (Stx1 and Stx2), a large plasmid that encodes an enterohemolysin, and a chromosomal gene cluster termed the locus of enterocyte effacement (LEE) that is found in most, but not all EHEC serotypes [4,5]. E. coli O157:H7 is the EHEC serotype most often associated with disease outbreaks and with the onset of severe disease in the U.S., Canada, Japan, and the U.K. [2,3]. Genomic sequencing of two outbreak-related E. coli O157:H7 strains, Sakai and EDL 933, revealed that there are many phage-related sequences and genomic islands scattered throughout the chromosome of this organism and that many of these genetic elements encode potential virulence attributes [6-9]. These E. coli O157:H7-specific genomic segments are dispersed throughout 177 different regions of a common genomic backbone that is shared with the distantly related E. coli K-12. Known as S-loops and O-islands (OI) in Sakai and EDL933 strains, respectively, some of the regions must be responsible for the virulence characteristics that were acquired during evolution of E. coli O157:H7. E. coli O157:H7 strains are believed to comprise a clonal complex of related genotypes that are found worldwide [10]. It has been suggested that E. coli O157:H7 arose from the enteropathogenic E. coli serotype O55:H7 through sequential acquisition of virulence traits and serotype change [11-13]. A step-wise evolution of E. coli O157:H7 from enteropathogenic E. coli O55:H7 was recently proposed, based on the properties of specific existent strains that carry intermediate characteristics and are presumed to represent intermediates in the evolution of this EHEC serotype [11,13]. The proposed evolutionary pathway includes lysogenization by an stx2-converting phage followed by a shift in serotype from O55 to O157 brought about by acquisition of the O157 gnd-rfb locus [14]. The EHEC large plasmid was then acquired by the organism and the ability to ferment sorbitol was lost. The sorbitol-non-fermenting O157:H7 ancestor was subsequently lysogenized with an stx1-converting phage and, finally, acquired a frameshift mutation in the uidA gene, resulting in loss of β-glucuronidase activity [11]. The validity of this stepwise model is supported by recent comparative genomic hybridization (CGH) studies using E. coli O157:H7 whole genome-based oligonucleotide microarrays [13]. It is well recognized that E. coli O157:H7 populations have a bovine reservoir and that the organism is likely adapted for life in the ruminant gastrointestinal tract [1518]. Using Octamer-Based Genome Scanning (OBGS), Kim et al., showed that Stx-producing, β-glucuronidase and sorbitol-negative E. coli O157:H7 strains have diverged into two distinct lineages, lineages I and II, and that descendants of these two lineages appear to have distinct ecological characteristics [19,20]. Populations of the two lineages are widespread in cattle in both the U.S. and Australia, suggesting that these two lineages have been disseminated throughout the global cattle population [20]. Analysis of a set of nearly 1,500 E. coli O157:H7 strains showed that lineage I strains are more commonly associated with human disease than lineage II strains, suggesting that there may be differences in virulence characteristics or transmissibility between these two taxonomic groups of E. coli O157:H7 strains [21]. Although high resolution comparative studies have indicated that prophages are associated with divergence of E. coli O157:H7 strains [6], systematic analysis of genetic distinctions between lineage I and lineage II strains has only recently been undertaken. We [22] and others [23] have recently reported that the Q anti-terminator gene found upstream of the stx2 operon in E. coli O157:H7 differs between lineage I and II strains. Possession of the stx2 gene is thought to be associated with the occurrence of more severe disease, such as hemolytic uremic syndrome, caused by EHEC strains [24]. In addition, Dowd and Ishizaki [25] recently used oligonucleotide mini-arrays to compare expression of a set of 610 genes between three lineage I and three lineage II strains, noting differential expression of stx2 as well as a number of other potentially virulence-associated genes under anaerobic growth conditions. Collectively, these published studies suggest that these lineages are genetically distinct and that lineage-specific genetic differences may be responsible for phenotypic differences between members of these two lineages. To systematically identify lineage-specific genome segments, microarray-based CGH was used in this study to catalogue genomic alterations that are unique to lineage I or lineage II strains. The oligonucleotide microarray was based on the genome sequences of two lineage I, human outbreak-related E. coli O157:H7 strains, Sakai [9] and EDL933 [7] and the nonpathogenic E. coli K12 (MG1655) strain [26] and it was used to probe the genomes of a collection of E. coli O157:H7 strains. Although significant strain-strain variation was observed, our focus was on genome alterations that were conserved within different strains of a given lineage. Regions of divergence identified by CGH were then cloned and sequenced to gain additional insight into the genomic differences between the two lineages. The results of the study show that many linPage 2 of 16 (page number not for citation purposes) BMC Genomics 2007, 8:121 http://www.biomedcentral.com/1471-2164/8/121 eage-specific differences in genomic content involve genes that are known or potentially virulence-associated. These findings may be used to identify candidate genes that could confer lineage-specific traits related to unique ecological or virulence characteristics. Results Validation of microarray data by comparison with sequence data In the CGH experiments, 6,057 probes from the MWG E. coli O157:H7 array set hybridized with a mixture of labelled DNA from the three reference strains (K12, Sakai, and EDL933) yielded adequate signals and these probes were used for all subsequent analysis. For E. coli O157:H7 EDL933, 5221/5261 (99.2%) of the probes with 100 % identity to the corresponding sequence gave the expected results (Table 1). Among the 40 probes that were expected to hybridize but did not with DNA from E. coli O157:H7 strain EDL933, 13 (0.25%) were negative and 27 (0.5%) were uncertain according to the GACK analysis. While for the E. coli O157:H7 Sakai strain only 4951/5335 (93%) of the probes with 100 % identity to the corresponding sequence gave the expected results. For strain Sakai, 39 (0.7%) were negative and 345 (6%) were uncertain based on GACK analysis. However, twenty-one of the probes with 100% identity to E. coli Sakai sequence that did not generate a positive signal with E. coli Sakai DNA were homologous to ORFs in S-loop#108 [9]. This S-loop is equivalent to OI#57 in E. coli O157:H7 EDL933. PCR experiments revealed that the Sakai strain used in this study has a deletion of these ORFs in S-loop#108 while the corresponding OI in EDL933 was intact (data not shown). Genomic variability in lineage I and lineage II E. coli O157:H7 strains In order to distinguish lineage-specific differences from strain-strain variability, multiple strains belonging to three different genotypic groups were tested. Our strain set included fifteen different LSPA genotype 111111 strains (lineage I), four different LSPA type 211111 strains (designated lineage I/II in this study) and 12 different lineage II strains of LSPA types 222222, 222221, 222212, and 222211. Characteristics of the strains used in the study are presented in Table 2, and data from microarray hybridization experiments with these E. coli O157:H7 strains are presented in the supplemental material [see Additional file 1]. A total of 4,084 of the 6,057 probes hybridized with all E. coli O157:H7 strains tested, indicating that this set of genes likely represents the conserved core genome of the ancestral E. coli O157:H7 population that has been maintained during its evolution. There were 222 probes that hybridized only with DNA from E. coli K12 and not with any of the E. coli O157:H7 strains tested, including two probes (ECs1372 and b1894) that were expected to hybridize with EDL933 and Sakai DNA, based on sequence identity. The remaining 1751 probes showed significant variability in microarray hybridization signals among E. coli O157:H7 strains (Table 3), and the ORFs that they represent were designated as variably absent or present (VAP). Of these 1,751 VAP, 79 hybridized with only one of the 31 E. coli O157:H7 strains tested and 662 hybridized with all but one of the 31 E. coli O157:H7 strains tested. Initial functional classification of the 1751 VAP genes showed that 506 (29%) were encoded by prophage or phage-like elements found in the K-12, EDL933 and Sakai genomes and 615 (35%) were located within K-island (KI), Oisland (OI), or S-loop genomic islands [7,9,26]. The distribution of VAP genes in the genomes of E. coli EDL933 and Sakai and the percentage of the 31 E. coli O157:H7 strains that were divergent for each gene were plotted (Figures 1 and 2). In this study, "lineage-specific" refers to the presence of single ORFs or ORF clusters exclusively in a given lineage, while "lineage-dominant" refers to the presence of single ORFs or ORF clusters in ≥80% of the strains of one lineage and their absence from ≥90% of strains of other lineages. Lineageand phage type-specific and lineage and phage type-dominant ORFs A total of 132 of the 1,751 VAP ORFs were either specific or dominant to a lineage, LSPA type or PT (Table 4, Figure 3). i) S-loop#14/OI#7 Three lineage I and lineage I/II-specific ORFs, ECs0237, ECs0238, and ECs0239, were identified in S-loop#14/ OI#7 by CGH (Table 4). The nucleotide sequence [GenBank:EF112439] of this region in the lineage II strain FRIK 920 was homologous to Sakai sequence, except that a Table 1: Summary of BLASTN results of MWG oligonucleotide probes queried against genomes of E. coli O157:H7 strains EDL933, and Sakai, and K-12 strain MG1655 ORFs with 100% identity to Less than 100% identity to Total probes Target K12 EDL933 Sakai EDL933 and Sakai EDL933 and K12 Sakai and K12 K12, EDL933 and Sakai K12, EDL933, or Sakai Probe No. 4269 5261 5335 5232 3659 3655 3654 84 6057 Page 3 of 16 (page number not for citation purposes) BMC Genomics 2007, 8:121 http://www.biomedcentral.com/1471-2164/8/121 stretch of DNA extending from the 3' end of ECs0237 to the 5' end of ECs0242 was missing. The missing ORFs encode rearrangement hot spot (rhs) proteins and hypothetical proteins in E. coli Sakai. ii) S-loop#16/OI#8 Eight E. coli S-loop#16/OI#8 ORFs were identified as being lineage I and lineage I/II-specific by CGH (Table 4). S-loop#16 corresponds to tandem prophages Sp1 and Sp2 in E. coli Sakai, and the majority of lineage I and lineage I/ II-specific ORFs in this region were homologous to prophage genes. Repeated attempts to amplify the divergent region in S-loop#16 by long template PCR with FRIK 920 DNA were unsuccessful. iii) S-loop#69/OI#45 S-loop#69/OI#45 corresponds to the stx2-converting bacteriophage Sp5, in E. coli Sakai. CGH revealed that this region was not only highly divergent but also showed lineageand LSPA type -dominant patterns of divergence (Table 4). Among the 31 E. coli O157:H7 strains examined, only lineage I strain 97701 (PT14) did not have a positive signal for stx2 A and B subunit genes. In 97701, other ORFs in this region were also divergent suggesting that bacteriophage Sp5 was not present in its genome. There were two clusters of lineage and LSPA type divergent ORFs in S-loop#69. The first cluster, consisting of ORFs ECs1160 to ECs1163 located upstream of the stx2 genes in E. coli Sakai, was missing in all four lineage I/II and the 12 lineage II strains but was conserved in all lineage I strains except strain 97701. The ORFs within this cluster encoded putative bacteriophage proteins and hypothetical proteins. The second cluster of divergent ORFs in S-loop#69/OI#45 consisted of 21 ORFs, that were missing in 11 out of 12 lineage II strains and present in all four lineage I/II strains Table 3: Genome ORF variability in 15 lineage I, four lineage I/II and 12 lineage II E. coli O157:H7 strains Divergent1 Strain # Group ORFs Phage-related genes Phage-unrelated genes OI, KI, SL2-related genes OI, KI, SL2-unrelated genes (backbone) 0 4084 689 3395 1077 3007 1 662 78 584 95 567 2 190 33 157 43 147 3–6 291 79 212 90 201 7–12 161 79 82 91 70 13–18 141 98 43 97 44 19–25 136 87 49 97 39 26–29 91 30 61 52 39 30 79 22 57 503 29 31 222 84 138 190 32 Total 1751 506 1245 615 1136 1Indicates the number of strains lacking specific ORFs. 2OI = O-island, KI = K-island, SL = S-loop. 3These ORFs are only from K-islands. Table 2: E. coli O157:H7 strains used in CGH experiments. Name Serotype Phage type Source LSPA type stx1 stx2 97701 O157:H7 14 Human 111111 + LRH6 O157:H7 14 Human 111111 + EC20011339 O157:H7 14 Bovine 111111 + + F1299 O157:H7 14 Bovine 111111 + + F5 O157:H7 14 Bovine 111111 + + 63154 O157:H7 31 Human 111111 + + 58212 O157:H7 31 Human 111111 + + F1095 O157:H7 31 Bovine 111111 + + H4420 O157:H7 87 Bovine 111111 + + E2328 O157:H7 87 Bovine 111111 + + ECI-634 O157:H7 87 Bovine 111111 + + Sakai O157:H7 32 Human 111111 + + EDL933 O157:H7 21 Human 111111 + + EC20000948 O157:H7 14 Human 111111 + + EC20000958 O157:H7 14 Human 111111 + + 59243 O157:H7 2 Human 211111 + 71074 O157:H7 2 Human 211111 + EC20030338 O157:H7 2 Human 211111 + Zap0046 O157:H7 2 Human 211111 + EC970520 O157:H7 67 Bovine 222222 + + LRH13 O157:H7 23 Human 222222 + + R1797 O157:H7 23 Human 222222 + + EC20020119 O157:H7 23 Bovine 222222 + + EC2000623 O157:H7 23 Bovine 222222 + + EC20000703 O157:H7 23 Bovine 222222 + + FRIK 920 O157:H7 23 Bovine 222222 + + FRIK1999 O157:H7 23 Bovine 222222 + + FRIK1985 O157:H7 45 Bovine 222221 + + FRIK1990 O157:H7 54 Bovine 222222 + + FRIK2001 O157:H7 54 Bovine 222211 + EC20000964 O157:H7 74 Human 222212 + Page 4 of 16 (page number not for citation purposes) BMC Genomics 2007, 8:121 http://www.biomedcentral.com/1471-2164/8/121 and 14 of 15 lineage I strains. These lineage I-dominant ORFs were located downstream of the stx2 genes and encoded putative bacteriophage proteins and hypothetical proteins and correspond to the late region of Sp5 of Sakai. PCR primers that flank S-loop#69, were used to amplify the corresponding DNA fragment in the lineage II E. coli strain FRIK 920. The nucleotide sequence of the amplicon showed that Sakai Sp5 prophage is not integrated into the chromosome at this site in E. coli FRIK 920. iv) S-loop#72/OI#43, 48 S-loop#72 in E. coli Sakai, which corresponds to duplicate OI#43 and OI#48 in E. coli EDL933, consists of the degenerate prophage SpLE1 in Sakai. S-loop#72 and OI#43,48 are also called tellurite resistanceand adherence-conferring islands because they contain genes responsible for these phenotypes [27]. Putative virulence-associated ORFs located outside of the lineage I and lineage I/II-specific cluster, including the urease genes (ECs1321-ECs1327), genes for tellurite resistance (ECs1351-ECs1358), and iha (IrgA homologue adhesin) (ECs1360) [27,28], were found by CGH to be conserved in all E. coli O157:H7 strains tested. However, 12 ORFs within S-loop#72 were lineage I and lineage I/II-specific (Table 4). The nucleotide sequence [GenBank:EF112440] of the FRIK 920 amplification product obtained for this region had high similarity to the E. coli Sakai sequences, except that a segment 10.8 kb from the 3' end of ECs1377 to the 5' end of ECs1391 was missing. The missing region includes two putative transposases ECs1380 and ECs1381, which were not identified by CGH. With the exception of ECs1382, which encodes a HecB-like protein, and ECs1388 (pchD), a PerC-homologue [29], all other lineage I and lineage I/ II-specific ORFs in this region encode hypothetical proteins. The distribution of divergent genes among 31 E. coli O157:H7 strains as de ermined in CGH experi e ts with MWG oligonucleotides Figure 2 The distribution of divergent genes among 31 E. coli O157:H7 strains as determined in CGH experiments with MWG oligonucleotides. As in Figure 1 except that the genome map of E. coli O157:H7 strain Sakai is used and Sloops and specific ORFs of interest are shown. 0% 20% 40% 60% 80% 100%
منابع مشابه
Towards resolving the interordinal relationships of placental mammals.
Here we show that progress towards a reliable phylogeny for placental mammals at the ordinal level continues apace. We draw especially upon insights from the recent “International Symposium on the Origin of Mammalian Orders” held at The Graduate University of Advanced Study, Hayama, Japan (21–25 July 1998), particularly work not incorporated in the remainder of this issue or published elsewhere...
متن کاملMammalian Evolution May not Be Strictly Bifurcating
The massive amount of genomic sequence data that is now available for analyzing evolutionary relationships among 31 placental mammals reduces the stochastic error in phylogenetic analyses to virtually zero. One would expect that this would make it possible to finally resolve controversial branches in the placental mammalian tree. We analyzed a 2,863,797 nucleotide-long alignment (3,364 genes) f...
متن کاملMaking the impossible possible: rooting the tree of placental mammals.
Untangling the root of the evolutionary tree of placental mammals has been nearly an impossible task. The good news is that only three possibilities are seriously considered. The bad news is that all three possibilities are seriously considered. Paleontologists favor a root anchored by Xenarthra (e.g., sloths and anteater), whereas molecular evolutionists have favored the two other possible roo...
متن کاملUsing genomic data to unravel the root of the placental mammal phylogeny.
The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, arm...
متن کاملPhylogenetics series Molecules consolidate the placental mammal tree
Deciphering relationships among the orders of placental mammals remains an important problem in evolutionary biology and has implications for understanding patterns of morphological character evolution, reconstructing the ancestral placental genome, and evaluating the role of plate tectonics and dispersal in the biogeographic history of this group. Until recently, both molecular and morphologic...
متن کاملGenomics, biogeography, and the diversification of placental mammals.
Previous molecular analyses of mammalian evolutionary relationships involving a wide range of placental mammalian taxa have been restricted in size from one to two dozen gene loci and have not decisively resolved the basal branching order within Placentalia. Here, on extracting from thousands of gene loci both their coding nucleotide sequences and translated amino acid sequences, we attempt to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PLoS Biology
دوره 4 شماره
صفحات -
تاریخ انتشار 2006